2024-09-22

EUfootball Library

  • This Library contains match results from seven European men’s football leagues, namely Premier League (England), Ligue 1 (France), Bundesliga (Germany), Serie A (Italy), Primera Division (Spain), Eredivisie (The Netherlands), Super Lig (Turkey). This includes seasons 2010/2011 until 2019/2020 and exciting covariates.
  • The Elo rating of each team is represented as EloHome for the Home team and EloGuest for the Away or the Guest team.
  • League represents the leagues from which the two teams are based.

Test of Hypothesis

  • Hypothesis testing is a statistical method used to make inferences about a population based on sample data.
  • Hypothesis testing procedure relies on using point estimates from a random sample.
  • H0 is the Null Hypothesis.
  • H1 is the Alternate Hypothesis.
  • We’ll use a specific use-case of the Hypothesis test to find a difference in the performance of Home and Away Games of Real Madrid (soccer club from La Liga Spain).

Real Madrid

Real Madrid, is a football club based in Madrid, Spain. The club compete in La Liga, the top tier of Spanish football.

In domestic football, the club have won 71 trophies: a record 36 La Liga titles, 20 Copa del Rey, 13 Supercopa de España, a Copa Eva Duarte, and a Copa de la Liga.[14] In international football, Real Madrid have won a record 34 trophies: a record 15 European Cup/UEFA Champions League titles, a record six UEFA Super Cups, two UEFA Cups, a joint record two Latin Cups, a record one Iberoamerican Cup, and a record eight FIFA Club World championships.

Real Madrid’s Home Vs Away Goals

El Classico Record for Real Madrid

El Clásico meaning “The Classic”, is the name given to any football match between rival clubs Barcelona and Real Madrid.

Paired T-Test Hypothesis

  • The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure used to determine whether the mean difference between two sets of observations is zero.
  • We are using a Paired T-test to test whether a statistically significant difference exists between the Elo ratings of Real Madrid’s home and guest games.

Important Terms for Paired T-Test

  • H0: Null Hypothesis
  • H1: Alternate Hypothesis
  • n-1: is the degrees of freedom
  • di: Difference between two paired sample on ith observation.
  • sd: Standard Deviation
  • t: Test Statistics
  • p: p-value
  • \(\alpha\): Significance Level

Steps to complete Paired T-Test

  1. Hypothesis Formulation
  2. Calculate the Mean Difference:
  3. Calculate the Standard Deviation of Differences
  4. Calculate the t-statistic:
  5. Calculate the p-value
  6. Conclusion and Inference

Hypothesis Formulation

Null Hypothesis H0: Assuming there is no significant difference between Real Madrid’s Elo ratings at home and away.

H0 : \(\mu\)home = \(\mu\)away
where \(\mu\)home is the mean Elo rating at home and \(\mu\)away is the mean Elo rating away.

Alternative Hypothesis H1: There is a significant difference between Real Madrid’s Elo ratings at home and away.

H0 : \(\mu\)home \(≠\) \(\mu\)away

Calculate the Mean Difference:

For each match, calculate the difference between the Elo ratings of the home and guest teams: di = \(\mu\)home,i - \(\mu\)guest,i

Formula to calculate the average of these differences \(\bar{d}\):

\(\bar{d} = \frac{1}{n} \sum_{i=1}^n d_i\)

[1] "The Average Mean Difference = 1.34202327536841"

Calculate the Standard Deviation of Differences

Formula to calculate the standard deviation of the differences, sd:

\(s_d = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (d_i - \bar{d})^2}\)

[1] "The Standard Deviation Difference= 305.299316001962"

Calculate the t-statistic:

Formula to calculate the test statistic, t:

\(t = \frac{\bar{d}}{s_d / \sqrt{n}}\)

[1] "The test statistics, t= 0.0856891855655436"

Calculate the p-value

The p-value can be calculated using R’s built-in t.test() function. We will be testing the p-value at a 95% confidence interval.

t_test_result <- t.test(real_madrid_data$eloHome, 
        real_madrid_data$eloGuest, 
        paired = TRUE, conf.level = 0.95, method="paired")
p=t_test_result$p.value
paste("The p-value =",p,sep=" ")
[1] "The p-value = 0.931758747782724"

Conclusion and Inference

The goal of the test was to determine if there is a significant difference between Real Madrid’s Elo ratings when playing at home versus away.

The t-statistic is t = 0.0857, which is a very small value. This suggests that the difference between home and away Elo ratings is minimal compared to the variability within the data.

The p-value is p = 0.9318, which is much higher than the common significance threshold of 0.05. This indicates that we fail to reject the null hypothesis.

In other words, there is no statistically significant difference between Real Madrid’s Elo ratings at home and away.

Real Madrid’s Elo Home VS Away